Complete de novo assembly of Wolbachia endosymbiont of Diaphorinacitri Kuwayama (Hemiptera: Liviidae) using long-read genome sequencing

Wolbachia, a gram-negative \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm{\alpha }$$\end{document}α-proteobacterium, is an endosymbiont found in some arthropods and nematodes. Diaphorina citri Kuwayama, the vector of ‘Candidatus Liberibacter asiaticus’ (CLas), are naturally infected with a strain of Wolbachia (wDi), which has been shown to colocalize with the bacteria pathogens CLas, the pathogen associated with huanglongbing (HLB) disease of citrus. The relationship between wDi and CLas is poorly understood in part because the complete genome of wDi has not been available. Using high-quality long-read PacBio circular consensus sequences, we present the largest complete circular wDi genome among supergroup-B members. The assembled circular chromosome is 1.52 megabases with 95.7% genome completeness with contamination of 1.45%, as assessed by checkM. We identified Insertion Sequences (ISs) and prophage genes scattered throughout the genomes. The proteins were annotated using Pfam, eggNOG, and COG that assigned unique domains and functions. The wDi genome was compared with previously sequenced Wolbachia genomes using pangenome and phylogenetic analyses. The availability of a complete circular chromosome of wDi will facilitate understanding of its role within the insect vector, which may assist in developing tools for disease management. This information also provides a baseline for understanding phylogenetic relationships among Wolbachia of other insect vectors.


Result and discussion
wDi genome assembly. The purpose of this study was to obtain an enclosed Wolbachia genome from D.
citri. Recently, we published wDi genomes from a single collection point of the same wDi culture used in this study, which were near complete but could not be circularized 21 . The sequencing of obligate endosymbionts such as Wolbachia is not an easy task because of their very low abundance, inability to grow outside a host, and inability to culture axenically 22 . In addition, collection of large amounts of high-quality DNA for whole genome sequencing requires a large quantity of bacteria. This requires a high number of infected host cells to obtain the obligate endosymbiont bacteria 22 . Thus, in this study, wDi samples were collected from combination of two collection points (cell passages) from the same culture to obtain high quality wDi DNA for whole genome sequencing. An overview of wDi extraction and genome assembly pipeline is shown in Fig. 1.
To produce a high-quality assembly, circular consensus sequences (CCS) were used. CCS are derived accurately from the noisy individual subreads which are consensus sequence obtained from multiple passes of a single template molecule 23,24 . The raw PacBio sequencing data obtained from the SMRT cells produced 899,643 filtered subreads and a total of approximately four billion bases, with the longest subread length of 118 kb. High quality CCS reads upto 32 kb size were generated from raw PacBio reads for high quality assembly. The maximum number of CCS reads (> 4,000) generated from using SMRT® LINK v7.0 using Sequel II system were of high quality with Q60 (Fig. 2a,b). Further, 45-bp left adapter sequences were trimmed from CCS reads. In addition, the short reads < 1000 bp and worst 10% of read bases were discarded to ensure high-quality assembly with the coverage of 72.89 × . We utilized pacbio-hifi parameter in Canu v1.9 to solve the complexity of Wolbachia genomes and www.nature.com/scientificreports/ generate complete assembly with overlapping ends that can be trimmed for circularization. Pacbio-hifi, recently integrated in Canu v1.9 provides high repeat resolution than pacbio-corrected at least on complex genomes like Wolbachia 18 . By default, Canu v1.9 with pacbio-hifi option uses only overlaps that are below 0.03% error which is much lower than used with pacbio-corrected option. In this study, we applied an even lower rate, corrected-ErrorRate = 0.001, that reduces the risk for the mis-assembly. Before trimming, the assembled genome size was 1,530,940 bp. The genomes after circularization were checked for potential errors using Illumina sequencing data. Firstly, the quality of trimmed Illumina data was ensured using FastQC to determine the data quality using various quality metrics. Phred quality scores per-base for the sample was higher than 30 and GC content of 33%, following a normal distribution. The Illumina data provided median coverage of 925 × for the sample. The analyses corrected 91 SNPs, 10 small insertions totaling 73 bases, and three small deletions totaling 41 bases. The de novo assembled genome after correction was 1,528,786 bp in size with an average GC content of 34.08% (Table 1). The complete genome is longer than the previously reported draft contigs of wDi which was estimated to be 1.25 Mb 12 . The wDi genome is largest among assembled Wolbachia genomes as compared with other Wolbachia from arthropods and nematodes. Previously, the largest Wolbachia genomes were from Folsomia candida (1.8 Mb) 19 , invasive cherry fruit fly Rhagoletis cingulata (1.53 Mb) 20 and embryos of Aedes albopictus (1.48 Mb) 25 .
Genome annotations and assessments. The wDi genome was annotated including protein coding genes, 5S, 16S, and 23S rRNA and tRNA genes. An overview of their genome features, including CDSs, rRNAs, and tRNAs was visualized in CG view Server (Fig. 3). PGAP annotations showed assembled wDi chromosome to contain total of 1,435 genes which are 1,394 coding sequences with 1,202 protein coding genes. Forty-one genes are related to RNAs (three RNAs, 34 tRNAs, and four noncoding RNAs) and 192 are pseudogenes. We compared the complete wDi chromosome with the draft wDi in various perspectives using various tools implemented in Microscope platform 26 . The core genes and genome specific genes was identified comparing wDi_AMZJ.1 12 based on Microscope gene families with parameter of 80% amino acid identity and 80% alignment coverage. A total of 1,073 genes were shared between two wDi genomes, while 239 and 183 genes were specific to wDi assembled in this study and wDi_AMZJ.1 12 , respectively, based on single transitive links (single linkage) with alignment coverage constraints and implemented in a software package (called SiLiX for SIngle LInkage Clustering of Sequences) ( Figure S1; Table S1). Notably, dnaK (fragment of chaperone protein), metC (fragment of cystathionine beta-lyase/L-cysteine desulfhydrase), ylbg (putative DNA-binding transcriptional regulator), insF (transposase), rpoC (fragment of RNA polymerase subunit beta), kefB (fragment of K +: H + antiporter) constituted the largest fraction of genes in complete wDi. However, the Microscope platform's gene phyloprofile analysis revealed that homologs for those genes exist in draft wDi, with homology constraints of identity greater than or equal to 35 percent (Table S2). In complete wDi, tandem duplications revealed 36 locations containing 286 genes, whereas draft wDi revealed just 20 regions involving 64 genes. Tandem duplicated genes have an identity ≥ 35% with a minLRap ≥ 0.8 and are separated by a maximum of five consecutive genes. It is evident that tandem duplications play major role in expansion of gene families 27 . In addition, the comparison between complete and draft wDi was done using lineplots, dotplots, and mauve alignment. The lineplot showed the strand conservation and inversions in the syntenic regions and shows high prevalence of transposases and insertion sequences throughout the complete wDi genome that are absent in the draft wDi (Fig. 4a). The dot plot shows the breaks and inversions when compared to the draft wDi (Fig. 4b). Mauve alignment showed some regions in the complete wDi genome whose locally collinear blocks (LCBs) were absent in the draft wDi (Fig. 4c). Each LCB is a homologous sequence region shared by two or more of the genomes under investigation and does not contain any homologous sequence rearrangements 28 . We also looked at a number of critical elements such as  Liberibacter asiaticus was also found in the complete wDi genome (GZ065_v1_1041). The BUSCO completeness scores of assembled wDi genome was also compared to Wolbachia reference genomes using bacteria_odb10 database (calculated in this study). The BUSCO completeness of the final assembled wDi genome showed 80.6% as compared to other reference Wolbachia genomes wOo (78.2%), wOv (78.2%), wFol (81.5%), wAlbB (84.7%), wBm (79.8%), wOo (78.2%), wMau (83.9%), wMel (83.1%), wPip (86.3%) and wRi (83.9%) suggesting similar number of 'complete and single-copy' genes recovered in wDi genome compared to reference Wolbachia genomes and is typical and reliable for comparative genomics among Wolbachia genomes 25 ( Figure S2). It has been suggested that even the complete genomes of Wolbachia miss up to 9 to 25 genes from the BUSCO set because of their endosymbiotic lifestyle which makes genes redundant, and these genes probably are not missing from the assemblies and annotations 29 . The final assembled wDi genome showed 94.0% completeness when the subset database, rickettsiales_odb10 was used for the BUSCO analysis. In addition, the checkM completeness of the assembled wDi genome was 95.73% with 1.45% contamination. The checkM completeness and contamination falls within the range of ≥ 95% complete with ≤ 5% contamination that makes excellent reference genome for analysis 30,31 . The checkM contamination of the previously published complete wFol genome (1.8 Mb) 19 was 1.82% (calculated in this study) which was assembled from filtered reads obtained from F. candida genome that was sequenced using PacBio sequencing technology (Table S3). In addition, the taxonomy to Wolbachia sp. was confirmed using Centrifuge v1.0.3 tool that showed all sequences belonging to Wolbachia species.
Insertion sequences (ISs), prophage genes, ORF7 and Ankyrin proteins. Insertion sequences are bacterial class-II transposons that are capable of replication and can spread throughout the genome using cut-and-paste mechanism 32 . ISs are classified into about 20 families and play key role in genome evolution 32,33 . Specifically, 10% of the Wolbachia genomes consist of insertion sequence elements 34 . A total of 138 ORFs related to ISs were found in the wDi genome, belonging to 14 different IS families ( Figure S3; Table S4). The most represented IS families were IS982 (28 copies; 20.3%), IS481 (26 copies; 18.8%), and IS110 (25 copies; 18.1%). Although the ISs in the wDi genome are diverse, they have less ORFs than in the entire circular wAlbB (CP031221) chromosome belonging to supergroup B, which has nine IS families and 216 ORFs associated to IS Prophages are subjected to selective pressure from their hosts, resulting in a variety of partial DNA genomic abnormalities such as recombination, gene loss, and progressive disintegration 35 . The prophage genes are dynamic elements that mediate horizontal gene transfer and are widespread in Wolbachia genomes 36,37 . Defective genomic prophages, also known as cryptic prophages, are virions that have lost their ability to generate virions and lyse host cells 35,38 . The most major difference between intact and cryptic WO is that intact WO possesses a rather complete gene module that codes for head, baseplate, and tail proteins, allowing it to generate active virions 39 . The prophage regions in the wDi genome showed five regions (four intact and one incomplete or cryptic) sized 55.8 kb, 23.1 kb, 32.2 kb, 11.9 kb and 34.6 kb containing 64, 33, 21, 18, and 24 proteins, respectively ( Figure S2; Table S5). Altogether, prophage region constituted total of 164 prophage-associated loci scattered in four intact and one incomplete regions with the combined size of 137.9 kb (10.3%) in the wDi genome. Based on the existence of all genomic structures (phage attachment sites, genes encoding structural phage proteins, and genes coding for proteins involved in DNA regulation, insertion to the host genome, and lysis), the four entire WOwDi phages have the ability to create virions. One cryptic WOwDi sized 11.8 kb (location: 522,438-534,303) lacks phage baseplate and tail assembly proteins. wDi genome supports widely held belief that Wolbachia with cryptic prophages usually has at least one intact WO prophage 40 . This shows the expansion of the prophage region when compared to other supergroup B members such as wAlbB_CP031221 (1.47%), wNo (4.09%), wMau (4.07%) and but comparable to wPip (1.48 Mb genome size) with 9.25% prophage sequences (with only one 59.8 kb sized intact prophage region with other four cryptic prophage regions) (Fig. 5). Surprisingly, PHASTER analysis revealed two cryptic prophage regions of 6.4 kb and 15.4 kb in wAlbB_CP031221 without the presence of intact   www.nature.com/scientificreports/ prophage region. However, four WO-like islands (designated wAlbB WO like island 01 through wAlbB WO like island 04) and 19 prophage-associated loci (13 CDS, 6 pseudogenes) were discovered by BLAST comparisons to several WO phages totaling 111 prophage-associated loci with a combined size of 116 kb (8%) without active prophages 25 . Other Wolbachia genome only with cryptic prophages were found in group A member, wWpum (Wolbachia in Wiebesia pumilae) 39 having no ability to produce active virions. The WO prophage areas are sometimes used in cytoplasmic incompatibility genetic investigations 41 . The BLASTp searches of WOMelB WD0631 (NCBI accession number AAS14330.1) and WD0632 (AAS14331.1) in Microscope platform for CifA and CifB protein sequences, respectively 41 found no homologs in the wDi strain for CifA but a few for CifB. Among CifB hits using HHpred 42 , GZ065_v1_1517, GZ065_v1_0240 follow Module B-1 (ModB-1 with PDDEXK nuclease family, and various other restriction endonucleases such as NucS, HSDR_N, and MmeI), and GZ065_v1_0695, GZ065_v1_0696, GZ065_v1_0704 follow Module B-3 [with ubiquitin-modification (Ulp-1) and protease-like domains (Sentrin-specific protease)] 41 .
In addition, the wDi genome revealed the presence of four different minor capsid gene ORF7 paralogs (GZ065_00870, GZ065_01245, GZ065_01575, and GZ065_6965) ( Figure S3) as in Nasonia vitripennis A Wolbachia 37 which are present in the four different prophage sequence regions. The protein domain annotations of the assembled genomes showed 57 (4.0%) proteins in the wDi genome to contain at least one copy of an ankyrin repeat domain (Figures S3; Table S6) which is comparable to ANK proteins wMel, wRi, and wPip with about 4% of the total genes 43 . These ANK proteins of about 33 amino acids play significant role in interactions between host and symbionts 34,44 and are found abundantly in genes of WO-prophage 44 .
Many contemporary hypotheses propose that obligate endosymbionts should have limited genome sizes 45 , similar to Wolbachia strains in filarial nematodes, which contain no or few insertion sequences, transposable elements, and prophage sequences, due to their obligate association with the host 46 . Recent study have shown that the genome of the obligatory wFol 29 strain, on the other hand, is the biggest complete Wolbachia genome ever identified, with 1,801,626 base pairs (bp) and highly enriched in repeated and mobile elements (124 transposases, 96 ankyrin repeat proteins, 34 DNA-repair genes, and 19 resolvases). In wDi too, the genome is highly enriched in repeated and mobile elements (109 transposases, 57 proteins with ankyrin repeats, 14 DNA repair proteins, and six resolvases) than other supergroup-B members 29 . All known Wolbachia strains are in a similar transitional stage, in which they are primarily vertically transferred and do not exist in specialized structures 47 . As a result, their genome size is expected to vary depending on the host 47 .
COG, eggNOG, and pfam annotations. COG automatic classification revealed 1,092 CDSs classified in at least one COG group in the wDi genome (Table S7). eggNOG annotations of protein coding genes assigned functions to 1,221 protein coding genes (Table S8). The top five pathways were related to "replication, recombination and repair", "translation, ribosomal structure and biogenesis", "energy production and conversion", "posttranslational modification, protein turnover, chaperones", and "coenzyme transport and metabolism". The Pathway Tools was used to observe whether the metabolic pathways were complete or not. The analysis showed 40 complete metabolic pathways and 62 incomplete metabolic pathways (Table S9) (Table S10).

Toxin-antitoxin system and Type IV Secretion SSystem (T4SS) genes. Toxin-antitoxin (TA)
systems are genetic components that consist of a toxin gene (proteins) and its antitoxin counterpart (protein or non-coding RNAs). In bacteria various processes, like translation, replication, cytoskeleton development, membrane integrity, and cell wall biosynthesis are affected by TA toxins 48 . PGAP annotation in the wDi genome revealed the presence of Type II RelE/ParE toxin genes, GZ065_00055, GZ065_03670 (pseudogene) and one Type II RatA family toxin gene, GZ065_04425. Based on the BLASTp search using wPip antitoxin gene, WP_007302904.1, we identified GZ065_00050 as a possible antitoxin gene for RelE toxin. Type II RatA family toxin gene, GZ065_04425 was situated immediate to ssrS noncoding RNA gene (Rfam RF00013), separated by fewer than 18 nucleotides. Previously, RelE/ParE and RatA/ssrS toxin-antioxin modules were also reported in wCle, wFol, wPip, wMel, wRi, wAu, wHa, wNo 49 .
Genes related to the Type IV Secretion System (T4SS) are another important group represented in Wolbachia. Bacteria utilize T4SSs to proliferate and survive inside the host secreting protein effectors, protein-DNA complexes 50 . The wDi genome revealed the presence of 14 genes associated to T4SSs (Table S11). These genes were organized in two operons in each wDi genome. Operon 1 contains virB8, virB9-1, virB10, virB11, and virD4.  Operon 2 contains virB3, virB4, virB6-1, and virB6-2. The virB2 and virB7 genes were found to be scattered elsewhere in the genomes. Interestingly, we found both virB2 (three copies) and virB7 (one copy) genes in the wDi genome. These genes have been reported as absent among Wolbachia and most members of the order Rickettsiales 51,52 . However, recent studies have shown the presence of virB2 gene (pilus component) in Wolbachia pipientis from Ae. albopictus (wAlbB) 25 , Wolbachia from Laodelphax striatellus 53 , Candidatus Wolbachia bourtzisii (wDacA), Wolbachia pipientis wDacB from Dactylopius coccus 54 , and Wolbachia from Muscidifurax uniraptor (wUni) 55 . In addition, the virB7 gene (pilus-associated protein) was previously observed only in Wolbachia from Laodelphax striatellus (wStri) 53 . Bing et al. 53 also showed wDi clustered together with wStri with a strong support in a monophyletic clade and suggested that these strains shared the same ancestor. www.nature.com/scientificreports/ study resulted three bins that were unique to wDi genomes. The Bin_1 consisted of 58 gene clusters with 127 genes common in both complete and incomplete wDi_AMZJ.1 12 genomes, Bin_2 consisted of 29 gene clusters with 62 genes that were unique to the complete wDi genome, and Bin_3 consisted of 12 gene clusters with 13 genes that were unique to incomplete wDi_AMZJ.1 12 genome (Fig. 6a, Table S12). The largest fraction of genes in three bins constituted Ankyrin repeat proteins (n = 28; play important role in interactions between host and symbionts) and IS4 transposase (n = 11; play role in DNA mobility using "cut and paste" mechanism), chromosome segregation ATPases (n = 5; play important role in chromosome condensation and segregation during cytoplasmic incompatibility in male insects), curved DNA-binding protein CbpA, containing a DnaJ-like domain (n = 2; act as a molecular chaperone in an adaptive response to environmental stresses other than heat shock), DNA repair protein RadC (n = 2), DNA-directed RNA polymerase (n = 2), RecA-family ATPase (n = 6) , REP element-mobilizing transposase (n = 2), transcriptional regulator with XRE-family HTH domain (n = 2), Mg/Co/Ni transporter MgtE (n = 2; important in inorganic ion transport and metabolism) and rest were conserved protein with unknown function.

Comparative genomics of wDi with reference
The ANI values among the wDi genome and reference Wolbachia genomes indicated the similarity in the range of 82% (supergroup D-wBm) to 95% (supergroup B-wAlbB) and 99.8% to incomplete wDi_AMZJ.1 12 genome (Fig. 6b). OrthoFinder assigned 21,264 genes (96.3% of total) to 1,924 orthogroups (Table S13) in the 15 Wolbachia genomes. There were 626 orthogroups with all species present and 407 of these consisted entirely of single-copy genes (Fig. 6c). The analysis showed 43 orthogroups unique to complete and draft wDi genomes. www.nature.com/scientificreports/ Phylogenetics of wDi and other Wolbachia genomes. The IQ-TREE v 1.6.8 tool was used to construct a ML phylogenetic tree using the concatenated protein sequences of single copy genes including ribosomal proteins of reference Wolbachia genomes obtained from NCBI database (Table S14) with the wDi genome. The single copy genes were utilized instead of multilocus sequence typing loci (gatB, coxA, hcpA, fbpA, and ftsZ) 5 8 which are problematic in phylogenetic analyses and may not accurately represent the properties of different Wolbachia strains 59 . The advent of sequencing technology and availability of complete and draft genomes of Wolbachia, recent phylogenetic studies have been done utilizing single copy gene sets 53,59,60 rather than wholegenome sequence typing 61 . Although comparisons of whole Wolbachia genome sequences is useful for strain differentiation, diversity estimates, and phylogenetic analyses, the size is cumbersome and not necessary to answer specific questions that can be addressed using genetic marker loci 59 . The obtained tree (Fig. 7) indicated that the wDi genome belonged to supergroup-B Wolbachia strains (wVulC, wCon, wLug, wBta, wStri, wAlbB, wDacB, wLcl, wNo, wMau, wAus, Ob_Wba, wBol1-b, wMeg, and wPip) and made a clade with wStri (the Wolbachia from Korean Laodelphax striatellus population) and wStri_1 (the Wolbachia from Chinese L. striatellus population).
Wolbachia are supergrouped (A, B, E-H), the Wolbachia endosymbionts of arthropods belong to supergroup-A and -B and of filarial nematodes belong to supergroup-C and -D 8,62 . wPpe belongs to supergroup-L 63 , whereas wCfeT strain is ancestrally to most other Wolbachia lineages (used as an outgroup) 64 . The phylogenetic analysis by Saha et al. 12 also indicated that Wolbachia from D. citri belongs to supergroup-B using FtsZ and Wsp genes.

Conclusions
The genome sequence of the Wolbachia culture isolated from D. citri was completely assembled and compared with other Wolbachia genomes available in the NCBI database. This study is in accordance with the study by Sinha et al. 25 , which demonstrated that high quality, complete Wolbachia genome assemblies can be achieved from long-read sequences of high coverage without enrichment, such as through Large Enriched Fragment Targeted Sequencing 67 and other target genome enrichment techniques 68,69 . In this study, we used DNA from an axenic Wolbachia cultures for whole genome sequencing rather than filtering Wolbachia sequence reads from the whole insect genome sequence. The latter, referred to as a metagenomic sequencing approach, is a frequent practice that generates low coverage reads for Wolbachia genome assembly 70,71 . Recent integration of the pacbio-hifi option in Canu (HiCanu) facilitates generation of complete assemblies consisting of repeat resolution on complex genomes like that of Wolbachia rather than pacbio-corrected assemblies in previous versions. In addition, concatenated protein sequences of single copy genes generated using hmm source from Campbell et al. 65 delineated supergroup-B Wolbachia of D. citri from other supergroups. The availability of a complete circular genome of the D. citri endosymbiont, Wolbachia, will facilitate the development of endosymbiont-mediated strategies for pest and disease management. This study expands the list of complete Wolbachia reference genomes that can be useful in studying evolutionary relationships among Wolbachia of arthropods and nematodes.

Materials and methods
Extraction of Wolbachia from D. citri (wDi). D. citri were collected from a laboratory culture established in 2005 from a population collected in Polk Co. (28.0′ N, 81.9′ W), Lake Alfred, Florida, USA. Individual psyllids were placed on sterile diet rings for two days prior to Wolbachia extraction. The surface sterilized psyllid was homogenized in 1.0 mL of Schneider's Drosophila (S2) medium (catalog number 21720024, Gibco) followed by centrifugation at 100 × g for five minutes. The supernatant was further centrifuged at 400 × g for five minutes to pellet wDi with insect debris. The pellet was resuspended with 1.0 mL of S2 medium separate wDi from impurities. The samples were centrifuged at 100 × g for five minutes to pellet impurities, and the supernatant was transferred to a new tube. The final centrifuge step was conducted at 4000 × g for five minutes, and the pelleted wDi was resuspended in fresh 1.0 mL of S2 media. www.nature.com/scientificreports/ Figure 7. Phylogenetic relationship of Wolbachia genomes using concatenated protein sequences of single copy genes obtained from each genome using hmm source of single copy genes by Campbell et al. 65 . The total of 78 Wolbachia genomes including wDi genome sequenced in this study was used. The maximum likelihood tree was constructed using IQ-TREE v 1.6.8 66 using ultrafast bootsrap mode with 5000 iterations. Branch support was estimated using the Shimodaira-Hasegawa (SH)-like approximate likelihood ratio test with 1,000 replicates.

Infection of wDi in S2 cells and isolation of wDi
The amino acid substitution model HIVb + F + I + G4 was used and wFol was set as the outgroup. The bootstrap values > 50% are shown at the respective node. The Wolbachia supergroups are color coded which are shown in color ranges. www.nature.com/scientificreports/ Long-read (PacBio) sequencing. Sequencing of wDi gDNA was performed on six replicate samples (five samples are not included in this study). wDi gDNA (4-8 µg in 150 µl TE) was sheared down to 10 kb using Covaris g-TUBES (catalog number 520079, Covaris Inc.), using two passes at 7,000 rpm. The resulting size of the fragments was verified on the TapeStation Genomic DNA ScreenTape (Agilent Technologies). Barcoded, 10 kb insert-size libraries were constructed using 600-700 ng of pure and fragmented (10 kb) from each bacterial sample using the protocol of PacBio for multiplex SMRT sequencing of bacterial genomes (PacBio Manual PN 101-069-200-02) in conjunction with barcodes from the Barcoded Adaptor Kit 8A (PacBio PN 101-081-300). Briefly, the library construction reactions consisted of the following sequential steps: ExoVII treatment, DNA Damage Repair, End Repair and Blunt-end ligation of barcoded SMRT bell adaptors. After ligation, samples were pooled, purified using AMPure, and treated with ExoIII/ExoVII to eliminate excess adaptors and any damaged DNA. This procedure resulted in ~ 800 ng of adaptor ligated SMRT bell library. The final library was further size selected in the SageELF™ instrument (catalog number ELD7510), using 0.75% agarose gel cassettes and the 1-18 kb v2 cassette definition program. The desired SageELF™ fractions in the 5-20 kb range, averaging 10 kb (TapeStation) were cleaned using AMPure magnetic beads (0.6:1.0 beads to sample ratio) and eluted in 15 μl of 10 nM Tris HCl, pH 8.0. The library size selection by ELF step yielded 126 ng of ready-to-sequence material. Sequencing was performed on the PacBio SEQUEL instrument using the Chemistry 3.0 reagents in combination with the SMRT® LINK v 6.0 software. The library was added on the PacBio SEQUEL sample plate at 8 pM by diffusion-loading and 224 min pre-extension time for sequencing in LR-SMRT cells with 20-h data collection. All other steps for sequencing were done according to the recommended protocol by PacBio sequencing calculator.
Short-read (Illumina) sequencing. The gDNA samples for Illumina sequencing were fragmented using the Covaris to 400 bp following the manufacturer recommended protocol. The genomic libraries were constructed using 100 ng as the input and the NEBNext Ultra II DNA library prep kit for Illumina (New England Biolabs). Three PCR cycles were performed with each library prior to library validation using the TapeStation High Sensitivity D5000 ScreenTape (Agilent Technologies). Libraries were quantified using the Qubit 1 × dsDNA HS Assay kit (ThermoFisher Scientific) and molar concentration was calculated to pool the libraries in equimolar ratios. The pool was then quantified and 14 pM was loaded into the MiSeq flow cell. The run was set as a 300 paired-end run using the 600-cycles v3 kit.
De novo genome assembly. PacBio CCS were generated using SMRT® LINK v7.0 using Sequel II system.
Genome correction. The PacBio-only assembled genome can have a high probability of indel errors 78 .
Therefore, the assembled genome was checked for potential errors using Illumina data obtained from respective samples using the Pilon error-detection and correction tool 79 80 . The quality of trimmed reads was assessed using FastQC v0.11.7 81 . After cleaning, the reads were mapped to the PacBio chromosome using bwa v0.7.17 82 using pair-end mode. The indexed bam output file obtained from bwa was utilized for indel correction using Pilon v1.22 79 .
Genome annotations and assessments. Genome annotation was done using the standard NCBI Prokaryotic Genome Annotation Pipeline (PGAP) 83 and Microscope platform 26 . PGAP annotations are available at NCBI GenBank. The annotations from Microscope platform were used for some comparative studies and mentioned when discussed below (represented by GZ065_v1_n  22 and wRec 103 . The previously published, non-circular wDi genomes wDi_AMZJ.1 12 was also included in the comparison. The pangenome analyses were performed using anvio v5.5.0 57 (http:// meren lab. org/ softw are/ anvio/). The taxonomy was assigned using Centrifuge v1.0.3 104 . The COGs to the reference genomes were assigned using program 'anvi-run-ncbi-cogs' . The program 'anvi-pan-genome' was used following flags and parameters: '-use-ncbi-blast' , '-minbit 0.5' , and '-mcl-inflation 5' for the wDi genome and reference genomes. The similarity between the wDi and reference genomes were calculated using 'anvi-compute-ani' which utilizes PyANI 56 in ' ANIb' mode to compute average nucleotide identity across the genomes. The orthogroups across the wDi and reference genomes were identified using Orthofinder v2.4.0 105 and common orthogroups across multiple genomes were visualized via UpSet plot using Intervene (https:// asnte ch. shiny apps. io/ inter vene/) 106 .

Phylogenetic analysis.
We constructed two maximum likelihood phylogenetic trees in different scale. The phylogenetic analysis was performed using protein sequences hits obtained via 'anvi-get-sequences-for-hmmhits, ' which utilizes the hidden markov model (hmm) source from Campbell et al. 65 using 139 single copy genes including 48 ribosomal genes. One small scale phylogenetic tree was constructed using seventeen complete Wolbachia chromosomes for studying and visualizing the abundance and variations of Insertion and prophage sequences. For big scale phylogenetic tree, seventy-seven Wolbachia genomes (taxid: 953) were downloaded from the NCBI database using command ncbi-genome-download to perform the phylogenetic analysis with wDi genome. The concatenated protein sequences of single copy genes were aligned using MUSCLE 107 and were subjected to ModelFinder 108 for RAxML tree using Bayesian Information Criterion (BIC). The best amino acid substitution model was used for construction of maximum likelihood phylogenetic tree using IQ-TREE v1.6.8 66 using ultrafast bootstrap mode with 5000 iterations. Branch support was estimated using the Shimodaira-Hasegawa (SH)-like approximate likelihood ratio test with 1,000 replicates. Modelfinder and IQ-TREE was integrated in a PhyloSuite v1.2.2 software 109 The rerooting, labeling, and color coding of the phylogenetic tree was performed using iTOL v5.7 (https:// itol. embl. de/) 110 .

Data availability
The accessions SRR10985324, and SRR11075881 under Bioproject PRJNA603775 connected with biosample SAMN13940805 have been deposited at the NCBI. The assembled genome and annotations have been deposited at the NCBI GenBank database under the accession CP048819. All the supplemental materials have been uploaded in Figshare: https:// doi. org/ 10. 6084/ m9. figsh are. 14397 131. Figure S1. Venn diagram showing common and genome specific genes between complete wDi and draft wDi_AMZJ.1 genome. Figure S2. BUSCO assessment of the completeness of wDi genomes with reference sequences. Figure S3. Circos plot representation of various features in the wDi genome. The wDi genome is represented by the outer circle. The first, second, third, fourth and fifth inner circle represents the track for IS elements, Ankyrin genes, T4SS genes, prophage sequences, and ORF7 sequences, respectively in the wDi genome. Table S1 shows list of complete and draft wDi genome specific genes. Table S2 shows list of orthologs of complete and draft wDi using annotation from Microscope platform. Table S3 shows list of Wolbachia genomes sequenced and assembled using different technology and assembly tools. Table S4 shows Insertion Sequences (ISs) in the wDi genome. Table S5 shows prophage statistics in the wDi genome. Table S6 shows list of Ankyrin genes in the wDi genome. Table S7 shows COG automatic classification of protein coding genes in the wDi genome. Table S8 shows eggNOG annotations of protein coding genes in the wDi genome. Table S9 shows Metabolic pathways analysis in the wDi genome. Table S10 shows Pfam domain annotations for the wDi proteins of the wDi genome. Table S11 shows list of genes related to Type IV Secretion System in the wDi genome. Table S12 shows summary of Wolbachia Pan gene clusters. Table S13 shows Orthogroup analyses. Table S14 shows list of Wolbachia genome assemblies downloaded from the NCBI database, consisting of 139 single copy genes including 48 ribosomal genes from Campbell et al. 65 used for the hidden markov model (hmm) source, concatenated protein sequences, and phylogenetic tree construction file. www.nature.com/scientificreports/ Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.